Condition numbers and scale free graphs 
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In this work we study the condition number of the least square matrix corresponding to scale 
free networks. We compute a theoretical lower bound of the condition number which proves that 
they are ill conditioned. Also, we analyze several matrices from networks generated with the linear 
preferential attachment model showing that it is very difficult to compute the power law exponent 
by the least square method due to the severe lost of accuracy expected from the corresponding 
condition numbers. 

PACS numbers: 02.60.Dc Numerical linear algebra 05.10Ln Monte Carlo methods 89.75.-k Complex systems 
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I. INTRODUCTION 



In the last years several networks were analyzed, like 
internet routers, biological and metabolic networks, or 
sexual contacts pj, Q, 0- an d the node degree distri- 
butions of all of them seem to follow a power law. Also, 
several models of graph growth were presented in order 
to explain the emergence of this power law distribution 
0, [f|, 0| . However, several critics aj: 



reared 




mainly fo- 
0, 



cusing on sampling bias 0, Hi, P 
and the quality of data fitting [14( , 

Recently, a simple experiment was presented in |l7j 
studying the linear fit on the log log scale of computa- 
tionally generated data with a pure power law distribu- 
tion, and a severe bias error was reported (36%, and 29% 
with logarithmic bins). 

In this work we present an underlying problem which 
explains those errors: regrettably, the matrix in the least 
square method is ill conditioned. Let n be the maximum 
degree of the network, we show that the condition num- 
ber grows at least as the logarithm of n. Moreover, we 
introduce a parameter c £ [0, 1] and we consider only the 
node degree distribution on [en, n] (in fact, this is a usual 
procedure, see 18]). Numerical computations show that 
the situation is worse when we focus on the tail of the 
distribution. 

Our results complement the ones in [13, where biolog- 
ical networks were considered and a different statistical 
problem arose, since on that work the power law fit was 
performed with the maximum likelihood method. 

Also, we compute the matrix condition for scale free 
graphs generated with the linear preferential attachment 
model introduced by Barabasi and Albert Q. We show 
that the matrix condition grows when the network size 
increases. 
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II. MAIN RESULTS 



Condition Number 



For a given matrix A £ R mXm , and a matrix norm 
the condition number is defined as 



cond(A) = || A\\ || A~ 



cond(A) — oo if det(a) = 



Usually, for the 2-norm the condition is denoted 
cond{A)2- The 2-norm is an operator type norm, i.e. 
for v £ R m , taking the vectorial Euclidean norm 



v 2 := 




we have 



||A|| 2 = «*p{||At,|| a : |M| 2 = 1}. 



Concerning the condition number, the following results 
are well known 19] : 



cond(A)2 



An 



(1) 



where A„ 



and A r 



are the minimum and the maxi- 



mum eigenvalue (in absolute value), and 



1 



cond(A)2 



= inf 



U-s\\, 



S singular 



(2) 



which says that cond(A)2 is the reciprocal of the relative 
distance of A to the set of singular matrices. 

The interest in the condition number for matrices is 
related to the accuracy of computations, since it gives 
a bound for the propagation of the relative error in the 
data when a linear system is solved. If cond(A) ~ 10 fc , 
then k is roughly the number of significant figures we can 
expect to lose in computations. 

More precisely, for a general system Ax — b, if we 
consider a perturbation on the right hand side 6, then 



2 



calling x to the exact solution of Ax = b it can be shown 
that 

< cona{A)2- 



\b\\ 



For n large we can write 

n 

Hj) ~ n(ln(ri) - 1)) + 0{ln{n)) 



A practical rule in statistics is to avoid the least square 
method when the condition number is greater than or 
equal to 900 (indeed they define k(A) = cond(A) 1 / 2 , and 
k > 15 is a strong sign of collinearity, see for example 



B. Theoretical Results 



and 



ln2 0') ~ n(ln 2 (n) - 2/n(n) + 2) + 0{ln 2 (n)). 

j'=i 

Replacing this expressions in J2J and Q , we get by taking 
limit 



Let us consider a graph G with k nodes X\, ■ ■ ■ , Xk, and 
d(xi) is the degree of node Xi, that is, the number of links 
emanating from Xj. Let us define 

n = max{d(xj) : 1 < i < £:}. 

For each j, 1 < j < n, let hj be the number of nodes 
with degree j. The existence of a power law dependence 
h(d) = ad 7 is usually observed in a log- log plot, and com- 
puted with the least square method after a logarithmic 
change of variables. 

First we assume that the degrees span the full integer 
interval [1,71]. In this case the matrix A n corresponding 
to the least square fit, regardless of the measured data, 
is given by 



1 E? =1 Mi) v; L in 2 (j) 



In certain a sense, this correspond to the best situation 
where the data span the full range of variables. The 
following result estimates the condition number of A n , 
when n — ► oo: 

Theorem II. 1 For n large, it holds 

cond(A n )2 ~ In (n) 

Proof: We use here 0). A straightforward computa- 
tion of the eigenvalues of A n gives 



n 

\ max = {n + Y^ ln 2 (j)) + y/K 



(3) 



lim n — >oo , A , , 1 
m 4 (n) 



□ 



Since in practice logarithmic bin is preferred (see for 
example [3), due to the sparsity of measurements at the 
tail of the distribution, our next result shows that also 
the corresponding matrix is ill conditioned. We suppose 
that the selected degrees for the computation are of the 
form e 3 with 1 < j < n. Calling A e n the corresponding 
least square matrix, we can write 



r; ,.y r; ..r 



And the following holds 



Theorem II. 2 For n large 



n(n+l) 

n — '- 

n(n+l) n(rt+l)(2rt+l) 
2 6 



cond{A e n) 2 ~ —n 2 - 
o 

Proof: Using again l|T[l. and computing explicitly the 
eigenvalues of A e n , we have 



7 + 2n 2 + 3n + \/61 + 25n 2 + 42n + 4n 4 + 12n 3 



Xmin 7 + 2n 2 + 3n - ^61 + 25n 2 + 42n + 4n 4 + 12n 3 



Hence, for n large 



where 



•*,„,„ , n + y^ln 2 (j) ) - VA, 



A=(n-5> 2 (j)) +4(^ln(j) 



(4) 



if a \ ^max 4 2 

cona[A e n)2 — ~ — n 



□ 

Numerical experiments in the next section suggest that 
considering a logarithmic bin of the form ae 3 is unneces- 
sary, since the condition number grows almost indepen- 
dently of a, see Table [I] 
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Cond(A ), c=0 

_ Cond(A n ), c.0.1 



TABLE I: Condition number with logarithmic bins 



ae 3 , 1 < j < n 


a=l 


a=0.1 


a=2 


n = 10 3 


1.319 x 10 6 


1.337 x 10 6 


1.343 x 10 6 


n = 10 4 


1.332 x 10 s 


1.334 x 10 8 


1.334 x 10 8 


n = 10 5 


1.333 x 10 10 


1.333 x 10 10 


1.333 x 10 10 


n = 10 6 


1.333 x 10 12 


1.333 x 10 12 


1.333 x 10 12 



1 2 3 4 5 



9 10 
X10* 



FIG. 1: Condition number of A n with n < 10 J 



TABLE II: Mean value of condition numbers for LPA graphs 
with different values of c 

Nodes Graphs c=0 c=0.05 c=0.1 



10 4 
10 s 
10 6 

If) 7 



5 x 10 4 113.7 379.7 703.7 

2.5 x 10 4 223.5 1058.4 1928.8 

10 4 409.0 2648.5 4560.0 

10 4 703.8 5897.6 9369.5 



4.5 




0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 



FIG. 2: Condition number of A n with < c < 0.5 



C. Numerical Simulations 

In this section we present several numerical computa- 
tions of matrix conditions. 

We computed the condition number of matrix A„ nu- 
merically by using MATLAB. Also, we computed the 
condition number for the truncated matrix A n , for each 
n we consider the matrix obtained with degree values be- 
tween cn and n. The results are shown in Figure ^ f° r 
n < 100000, c = and c = 0.1. 

We show the dependence on c in Figure El for n = 10 4 
and n — 10 5 , with c from to 0.5. 

In Table[I]we show the condition numbers for logarith- 
mic bins of the form ae J , 1 < j < n, for n = 10 3 , 10 4 , 10 5 , 
and 10 6 ; and a = 0.1, a — 1 and a = 2. 

Finally, we consider the Linear Preferential Attach- 



ment model of Barabasi and Albert. This is a model 
of network growth, where a new node is added with a 
link to a previously added node, chosen at random with 
a probability proportional to its degree. 

We generated 5 x 10 4 graphs of 10 4 nodes, 25 x 10 3 
graphs of 10 5 nodes, 10 4 graphs of 10 6 nodes, and 10 4 
graphs of 10 7 nodes, and computed the condition of the 
least square matrix associated with each one. We show 
the distribution of values of the condition number in Fig- 
ure |3J Also, in Table [B] we present the computation of 
mean values of the condition number for c = 0, c = 0.05 
and c = 0.1. 



III. CONCLUSIONS 

We have studied the condition number of the least 
square matrix corresponding to scale free networks. We 
computed theoretical lower bounds of the condition num- 
bers showing that it behaves roughly as the logarithm of 
the maximum degree of the network, and numerical simu- 
lations support this fact. We also showed that neglecting 
the less connected nodes of the network (a usual practice 
in fact, since the interest is on the tail) things become 
even worse. Similar conclusions can be drawn for the 
logarithmic bin. 

Finally, for random networks generated with the Linear 
Preference Attachment model, numerical computations 
of the condition numbers showed a severe ill condition of 
the least square matrices, even for small sized networks 
(10 4 nodes). Clearly, in this context it is very difficult 
to compute the power law exponent by the least square 
method due to the lost of accuracy expected from the 
corresponding condition numbers. 
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Condition for graphs made with LPA model 



0.3 




2000 4000 6000 8000 10000 12000 14000 16000 18000 

FIG. 3: Condition number of graphs of 10 4 , 10 5 , 10 6 and 10 7 nodes, computed over 5 x 10 4 , 2.5 x 10 4 , 10 4 and 10 4 graphs 
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